Vanilla Classifiers for Distinguishing between Similar Languages
نویسندگان
چکیده
In this paper we describe the submission of the UniBuc-NLP team for the Discriminating between Similar Languages Shared Task, DSL 2016. We present and analyze the results we obtained in the closed track of sub-task 1 (Similar languages and language varieties) and sub-task 2 (Arabic dialects). For sub-task 1 we used a logistic regression classifier with tf-idf feature weighting and for sub-task 2 a character-based string kernel with an SVM classifier. Our results show that good accuracy scores can be obtained with limited feature and model engineering. While certain limitations are to be acknowledged, our approach worked surprisingly well for out-of-domain, social media data, with 0.898 accuracy (3rd place) for dataset B1 and 0.838 accuracy (4th place) for dataset B2.
منابع مشابه
The NRC System for Discriminating Similar Languages
We describe the system built by the National Research Council Canada for the ”Discriminating between similar languages” (DSL) shared task. Our system uses various statistical classifiers and makes predictions based on a two-stage process: we first predict the language group, then discriminate between languages or variants within the group. Language groups are predicted using a generative classi...
متن کاملEffective Learning to Rank Persian Web Content
Persian language is one of the most widely used languages in the Web environment. Hence, the Persian Web includes invaluable information that is required to be retrieved effectively. Similar to other languages, ranking algorithms for the Persian Web content, deal with different challenges, such as applicability issues in real-world situations as well as the lack of user modeling. CF-Rank, as a ...
متن کاملDiscriminating Similar Languages: Evaluations and Explorations
We present an analysis of the performance of machine learning classifiers on discriminating between similar languages and language varieties. We carried out a number of experiments using the results of the two editions of the Discriminating between Similar Languages (DSL) shared task. We investigate the progress made between the two tasks, estimate an upper bound on possible performance using e...
متن کاملExperiments in Discriminating Similar Languages
We describe the system built by the National Research Council (NRC) Canada for the 2015 shared task on Discriminating between similar languages. The NRC system uses various statistical classifiers trained on character and word ngram features. Predictions rely on a two-stage process: we first predict the language group, then discriminate between languages or variants within the group. This year,...
متن کاملInvestigating Effect of Olfactory Stimulation by Vanilla on the Rate of Apnea Attacks in Neonates with Apnea of Prematurity: A Randomized Clinical Trial
Background Apnea of prematurity (AOP) is a developmental disorder that affects the premature newborns frequently. One of the new non-drug methods for controlling apnea attacks is olfactory stimulation. The aim of this study was to determine the effect of olfactory stimulation by vanilla on the rate of apnea attacks in neonates with AOP. Materials and Methods: This study is a single-blind random...
متن کامل